Exhaustive enumeration unveils clustering and freezing in random 3-SAT 
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We study geometrical properties of the complete set of solutions of the random 3-satisfiability 
problem. We show that even for moderate system sizes the number of clusters corresponds surpris- 
ingly well with the theoretic asymptotic prediction. We locate the freezing transition in the space 
of solutions which has been conjectured to be relevant in explaining the onset of computational 
hardness in random constraint satisfaction problems. 
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Satisfiability (SAT) is one of the most important prob- 
lems in theoretical computer science. It was the first 
problem shown to be NP-complete [l|, Q , and it is of cen- 
tral relevance in various practical applications, including 
artificial intelligence, planning, hardware and electronic 
design, automation, verification and more. It can thus 
be pictorially thought of as the Ising model of computer 
science. Ensembles of randomly generated SAT instances 
emerged in computer science as a way of evaluating algo- 
rithmic performance and addressing questions regarding 
the average case complexity. 

An instance of random KSKT problem consists of N 
Boolean variables and M clauses. Each clause contains 
a subset of K distinct variables chosen uniformly at ran- 
dom, and each clause forbids one random assignment of 
the K variables out of the 2^ possible ones. The prob- 
lem is satisfiahle if there exists a variable assignment that 
simultaneously satisfies all clauses and we call such an 
assignments a solution to the problem. When the den- 
sity of constraints a — M/N is increased, the formulas 
become less likely to be satisfiable. In the thermody- 
namical limit there is a sharp transition from a phase 
in which the formulas are almost surely satisfiable to a 
phase where they are almost surely unsatisfiable. The ex- 
istence of this transition is partly established rigorously 
. It is also a well known empirical result that the hard- 
est instances are found near to this threshold [1, 0| . 

Random if-SAT has attracted interest of statistical 
physicists because of its equivalence to mean field spin 
glasses Q- Indeed, the problem can be rephrased as 
minimizing a spin glass-like energy function which counts 
the number of violated clauses. The results and insights 
coming from this equivalence are remarkable. The sat- 
isfiability threshold and other phase transitions in the 
structure of solutions are described in [^, [13] . In par- 
ticular, it was shown that for if > 3 the space of solutions 
for highly constrained but still satisfiable instances splits 
into exponentially many clusters and in some cases this 
clustering has been rigorously confirmed ll|, llj]. The 



so-called freezing of variables in clusters is another rich 
concept studied recently [3, [S]- However, a detailed 



understanding of how the clustering or freezing of solu- 
tions affects the average computational hardness is still 
one of the most interesting open problems in the field. 

Since the exact statistical physics solution of the 
random satisfiability problem appeared 1, M dozens 
of directly related articles followed. Mathematicians 
and computer scientist nowadays regard these analytical 
works as a rich source of results which are mostly unac- 
cessible to the current probabilistic methods. Yet none of 
these works tried to compare the analytical asymptotic 
predictions to numerical simulations on a quantitative 
level and numerical investigations mostly concentrated 
on performance analysis of satisfiability solvers. There- 
fore the relevance of the asymptotic predictions for sys- 
tems of practical sizes, which in computer science are not 
at the scale of the Avogadro number, remained almost 
untouched. Our letter aims at filling this gap and to en- 
couraging further investigation in this direction. We use 
conceptually relatively simple numerical techniques and 
yet obtain nontrivial results. We present two of our most 
interesting findings. The first is a quantitative compar- 
ison between the number of clusters of solutions (glassy 
states) and its analytical prediction [l^, 3, ■ The 
second is the location of the freezing transition which was 
recently suggested to be responsible for coniputational 
hardness of the random satisfiability problem 14 , , 
but not yet computed in the 3-SAT problem. 

Clustering and freezing — In physics of glassy sys- 
tems, clusters correspond to pure thermodynamical 
states and are being described in the literature about 
glasses and spin glasses for more than one quarter of a 
century f^. A formal definition of clusters in K-SAT as 
extremal Gibbs measures was given recently in [lo| . We 
will refer to these as the cavity-clusters. It is not known, 
however, how to adapt this definition to instances of fi- 
nite size. In this work, we define clusters as connected 
components in a graph where each solution is a vertex 
and where edges connect solutions that differ in only one 
variable [s^ . This definition is applicable to any finite 
instance of the K-SAT problem. It is most likely not 
strictly equivalent to the definition of the cavity-clusters, 
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yet it reproduces many of their properties. 

In order to shed light on the relation between cavity- 
clusters and connected-component clusters we now intro- 
duce the procedure called whitening and the concept of 
frozen variables. Whitening of a solution in iiT-SAT is 
defined in the following way [20|: start with the solution, 
assign iteratively a "*" (joker) to variables which belong 
only to clauses which are already satisfied by another 
variable or already contain a * variable |38||. Whitening 
is in the literature referred also as peeling [2l| or coars- 
ening The fixed point of this procedure is called a 
whitening- core, it is also referred as core 12)l2l|, or true 
cover [22] . A variable is said to be frozen in a set of so- 
lutions if it takes only one value (either or 1) in all the 
solutions in the set. Note that if the satisfiability thresh- 
old is sharp there cannot be a finite fraction of variables 
frozen in all the solutions in the satisfiable region [2^. 
On the other hand, variables might be frozen in the indi- 



vidual clusters. According to the cavity method [13, 2^ 
this is indeed the case and freezing of clusters have been 
studied in [HQ [ill . 

According to the cavity method 3, 2^ there is a deep 
connection between frozen variables and the whitening- 
core: if the one-step replica symmetric solution is correct 
then on large typical instances the set of frozen variables 
in the cavity-cluster and the non-* part of the whitening 
core are identical Thus the whitening cores of all so- 
lutions belonging to one cavity-cluster are identical. This 
also holds for the connected-component clusters: Indeed, 
two solutions that differ in a single variable have the same 
whitening core since the whitening can be started from 
that specific variable [3§|. Further, variables belonging 
to the whitening core must be frozen in the connected- 
component cluster, the opposite implication is in general 
not true 0. 

Two additional remarks about clusters are important. 
First, whitening cores are sometimes wrongly identified 
with clusters. In part of the clustered phase almost all so- 



lutions belong to soft (unfrozen) cavity-clusters [IJ, 
In particular in 3-SAT this seems to be the case at least 
up to constraint density a = 4.25 Second, it seems 
that all known heuristic algorithms need an exponen- 
tial time to find solutions with a non-trivial (not all-*) 
whitening cores, see e.g. 14, 21, [2^. This motivates 
our study of the freezing transition, a/. It is defined as 
the smallest density of constraints a such that all solu- 
tions belong to frozen clusters, i.e., their whitening core 
is not made from all-*. We use the whitening core in- 
stead of the real set of frozen variables, because in small 
instances there are almost always at least few frozen vari- 
ables. Existence of the frozen phase was proven in the 
thermodynamical limit for K > 9 near to the satisfiabil- 
ity threshold in . Several theoretical investigations of 
a related rigidity transition, where clusters which contain 
almost all the solution become frozen, can be found in 
I3I 14 , 2^ . But as long as soft clusters exist some algo- 



rithms may be able to find them, as shown in [l9| . A re- 
lated numerical study [24I investigates the size depen- 
dence of the fraction of frozen solutions at a = 4.20 < a/. 

The complexity function — We generate instances of 
random 3-SAT problems with N variables and M clauses 
using the makewf f program [2^. The number of solutions 
is then calculated using the exhaustive search method 
relsat |30|] and the complete set of solutions is clustered 
through breath first search. This works as follows: We or- 
der the Af solutions in binary lexical order. Further, for 
all the solutions we generate all the N neighboring config- 
urations, search them in the list and if found concatenate 
the two in the same cluster resulting in an algorithmic 
complexity of 0(7Vlog^ A/"), considering that log TV « N. 
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FIG. 1: The average complexity function, logarithm of the 
number of connected-component clusters divided by A'', for 
different system sizes compared to the asymptotic prediction 
[isl . ITtI ) . Note that the numerical curves will continue to much 
lower values of a than plotted. 

In order to obtain information about clusters in a typ- 
ical formula with N variables and M clauses, we count 
the number of solutions in A — 999 such random formu- 
las and select the median instance in terms of number 
of solutions on which we then count the number of clus- 
ters S. This is repeated B = 1000 times. The complexity 
function S(iV) — (iog S)/N is then computed as average 
of the logarithm of the number of clusters divided by 
system size N. If the median instance is unsatisfiable 
it contributes a zero value to the average, this does not 
have influence of the asymptotic value. Taking the me- 
dian has two important advantages, first we avoid rare 
formulas with very many solutions which are numerically 
intractable, second the complexity converges very fast to 
zero in the unsatisfiable region. The result is plotted 
in Fig. [1] and compared with the asymptotic complexity 
function computed from the survey propagation equa- 
tions, which in 3-SAT gives a non-zero result for a > 3.92 
ll,[i3- The agreement is remarkably good, in particular 
around the satisfiability threshold as = 4.267 [ 
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It was discussed in [lO| , and shown numerically in [24I , 
that clusters exist even for a < 3.92. We indeed do not 
see anything particular happening at a = 3.92. Below 
the clustering transition, a < 3.86, however, the largest 
cavity-cluster should contain almost all the solutions [lO| . 
We see a corresponding trend in the average fraction of 
solutions covered by the largest cluster in our data. It 
should also be mentioned that the survey propagation 
prediction is believed to be exact only for a > 4.15 [l6l |. 

The freezing transition — In order to determine the 
freezing transition we start with a formula of N variables 
and all possible clauses, and remove the clauses one by 
one independently at random. We mark the number of 
clauses Ms where the formula becomes satisfiable as well 
as the number of clauses Mf < Mg where at least one so- 
lution starts to have an all-* whitening core. We repeat 
B-times {B = 2- 10^ in Fig. [2]) and compute the probabil- 
ities that a formula of M clauses is satisfiable Ps{a, N), 
and unfrozen Pf{a, N), respectively. Due to memory lim- 
itation we can treat only instances which have less than 
5 • 10^ solutions which limits us to system sizes N < 100. 
Our results for the satisfiability threshold are consistent 
with previous studies in 0, [23, [3l| . The probability of 
being unfrozen, Pf(a,N), is shown in Fig. [2] 

It is tempting to perform a scaling analysis as has 
been done in [J 0, |3l[ for the satisfiability threshold. 
The critical exponent related to the width of the scal- 
ing window was defined via rescaling of variable a as 
N^/''''{1 - a/asiK,N)). Note, however, that the esti- 
mate I's — 1-5 ± 0.1 for 3-SAT provided in |3l| is not 
the correct asymptotic value. It was proven in |32l| that 
i/g > 2. Indeed it was shown numerically in [33| that a 
crossover exists at sizes of order N « 10^ in the related 
XOR-SAT problem. A similar situation happens for the 
scaling of the freezing transition, Pf{a, N), as the proof 
of [ill applies also here 41 1 . It would be interesting to in- 
vestigate the scaling behavior on an ensemble of instances 
where results of [32] do not apply. Here we concentrate 
instead on the estimation of the critical point, which we 
presume not to be influenced by the crossover in the scal- 
ing. We are in a much more convenient situation than 
for the satisfiability transition. The crossing point for the 
functions Pf{a,N) of different system sizes seems not to 
depend on while for the satisfiability transition its 
size dependence is very strong [sij. 

We determine the value of the freezing transition as 
af = 4.254 ± 0.009, which is extremely close to the sat- 
isfiability threshold a, = 4.267 [l3|. Analytical study 
suggests a/ > 4.25 [271. We expect the two transitions 



to be separated aj < as [l^, llJ, l26| , and Fig. O suggests 
so but it is on the border of statistical significance. How- 
ever, the main motivation to study the freezing transition 
is its potential connection to the onset of algorithmical 
hardness We thus compare its value with the 

estimates of performance of the best algorithms known 
for random 3-SAT. The leading stochastic local search 
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FIG. 2: Top: Probability that there exists an unfrozen so- 
lution as a function of the constraint density a for different 
system sizes. The clustering and satisfiability f9| transi- 
tions are marked for comparison. Bottom: A 1:20 zoom on 
the critical (crossing) point, our estimate for the freezing tran- 
sition is af — 4.254 ± 0.009. The curves are cubic fits in the 
interval (4,4.4). The arrows represent estimates of the lim- 
its of performance of the best known stochastic local search 
[2I . [3^ and survey propagation [s^, [s^ algorithms. 



algorithms work in linear time up to a = 4.21 2^, 34 1. 
The survey propagation (SP) decimation was estimated 
to work up to a = 4.252 [35|, the same point was de- 
termined as the limit of the SP reinforcement [3^ . The 
agreement between our location of the freezing transi- 
tion and the performance of SP supports strongly the 
conjecture that the frozen phase is hard for any known 
algorithm. In random 3-SAT this region is very narrow, 
in contrast to the situation in K > 9 SAT [l^ . 

Discussion — The main contribution of this work is 
the demonstration that the asymptotic predictions com- 
ing from the statistical physics analysis are relevant even 
for instances of very moderate size. In particular, we pre- 
sented a numerical comparison between the number of 
connected-component clusters and the asymptotic pre- 
diction for the complexity function in random 3-SAT 
and obtain a remarkably good agreement. Furthermore, 
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we estimate the location of the freezing transition at 
a/ = 4.254, which is consistent with the performance 
threshold of the best known algorithms. We also show 
that exhaustive enumeration, despite its current size lim- 
itations, is a powerful tool to study random optimization 
problems: indeed the knowledge of the complete set of so- 
lutions allows to tackle questions that are complementary 
to those answered by classical Monte-Carlo methods. 

The definitions of clusters and the whitening core, that 
we adopted, is applicable to any instance of the satisfia- 
bility problem. As such, they offer an interesting direc- 
tion for future research of real- world A^-SAT instances. In 
addition, we observe that the properties related to clus- 
tering are less sensitive to finite-size effects than the ones 
related to the solutions themselves. This is interesting 
and certainly worth further investigations. Future work 
could also cover 2-SAT, where the solutions are much 
more numerous even for very small system sizes, or K- 
SAT with 7^ > 3, where larger formulas will be needed 
to investigate the relevant regions, however, the freezing 
transition is more separated from the satisfiability when 
K grows. The numerical location of the clustering and 
condensation transitions is also of interest. 
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